Approximate String Matching ? Edgar

نویسنده

  • Gonzalo Navarro
چکیده

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suux tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us nding the R occurrences of a pattern of length m in a text of length n in average time O(m log 2 n+m 2 +R), using O(n log n) space and O(n log 2 n) index construction time. This complexity improves by far over all other previous methods. We also show a simpler scheme needing O(n) space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Metric Index for Approximate String Matching

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the occ occurrenc...

متن کامل

A Consensus Algorithm for Approximate String Matching

Approximate string matching (ASM) is a well-known computational problem with important applications in database searching, plagiarism detection, spelling correction, and bioinformatics. The two main issues with most ASM algorithms are (1) computational complexity, and (2) low specificity due to a large amount of false positives being reported. In this paper, a very efficient ASM method is propo...

متن کامل

An empirical evaluation of a metric index for approximate string matching

In this paper, we evaluate a metric index for the approximate string matching problem based on suffix trees, proposed by Gonzalo Navarro and Edgar Chávez [9]. Suffix trees are used during the index construction to generate intermediate data (pivot table) that to be indexed and the query processing. One of the main problems with suffix trees is their space requirements. To address this, we propo...

متن کامل

Data structures and algorithms for approximate string matching

This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.

متن کامل

Simulation of NFA in Approximate String and Sequence Matching

We present detailed description of simulation of nondeterministic nite automata (NFA) for approximate string matching. This simulation uses bit parallelism and used algorithm is called Shift-Or algorithm. Using knowledge of simulation of NFA by Shift-Or algorithm we design modi cation of ShiftOr algorithm for approximate string matching using generalized Levenshtein distance and modi cation for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008